library("here")
library("tidyverse")
library("sf")
library("rnaturalearth")
library("rcartocolor")
library("scico")
library("hexbin")
library("stars")
library("scales")
library("hexbin")
theme_set(theme_minimal())Maps
Setup
In class, I introduced the rnaturalearth package and how it provides nice country, state, and shoreline data for making maps.
Just like before, let’s obtain country data using the ne_countries() function.
world <- ne_countries(scale = "medium", returnclass = "sf")Choropleth Map: Population of African Countries
A choropleth is a very common style of showing spatial data in which we color individual regions in a map according to some data that we want to display. The country data from rnaturalearth contains lots of attribute columns, including estimates of population (the “pop_est” column). In this exercise, we want to make a choropleth map of the African continent showing the population of each country.
We can use filter on the world data, as we would on any other data frame. There’s a convenient column named “continent” that we can use for the filter.
africa <- filter(world, continent == "Africa")Problem 1: A Basic First Attempt
Creating the map is easy: just add in a call to geom_sf() with the africa data and map fill to pop_est. But we will want to fine-tune this map in various ways.
ggplot(africa) +
geom_sf(aes(fill = pop_est))Problem 2: Theme tweaks but still problems
Let’s improve the plot similar to how we styled the Katrina plot: blue for the surrounding water, a border around the map area, and the nice sequential “Mint” color scheme from the rcartocolor package. Let’s also add country labels, which are stored in a column called “name”, using the geom_sf_text() function.
Fill the countries usind the “Mint” color scheme from rcartocolor.
ggplot(africa) +
geom_sf(aes(fill = as.numeric(pop_est))) +
geom_sf_text(aes(label = name), size = 2.5) +
scale_fill_carto_c(palette = "Mint", labels = comma, name = "Population") +
labs(x = "Longitude", y = "Latitude") +
theme(panel.background = element_rect(fill = "aliceblue"),
panel.border = element_rect(fill = NA))That certainly looks better, but there are still perceptual problems with this map. In particular, Africa is not surrounded by endless ocean, and we have a lot of space on the bottom because of South Africa’s Prince Edward Islands, which are about half way between South Africa and Antarctica.
Problem 3: A Finished Choropleth Map
We can fix these issues by:
- Drawing the rest of the world underneath Africa
- Providing x and y limits to
coord_sf()(otherwise, by adding all other countries, we’ll see the entire world)
We can get the bounding box around our africa object like this:
st_bbox(africa) xmin ymin xmax ymax
-25.34155 -46.96289 51.39023 37.34038
Then, we can plug in those numbers for xlim and ylim in coord_sf(), shaving a bit off the ymin value so that the map doesn’t extend so far south. Finally, since this map is built from multiple data sets, we should now provide the data at the geom_ level.
Fill the countries using an appropriate color scheme of your choice.
Note that I have added lots of different theme and guide customization just to give you an idea of what’s possible.
ggplot() +
geom_sf(data = world, fill = "gray80") +
geom_sf(data = africa, aes(fill = as.numeric(pop_est)/1e6)) +
geom_sf_text(data = africa, aes(label = name), size = 2.5) +
scale_fill_carto_c(palette = "Mint",
labels = comma,
name = "Population (millions)",
limits = c(0, 200),
breaks = seq(0, 200, by = 50)) +
coord_sf(xlim = c(-25.34155, 51.39023),
ylim = c(-37, 37.34038)) +
labs(x = "Longitude", y = "Latitude") +
theme(panel.background = element_rect(fill = "aliceblue"),
panel.border = element_rect(fill = NA),
legend.title.align = 0.5,
legend.background = element_rect(fill = "white"),
legend.text.align = 0.5,
legend.justification = c(0, 0),
legend.position = c(0.02, 0.1)) +
guides(fill = guide_colorbar(direction = "horizontal",
label.position = "bottom",
title.position = "top",
ticks = FALSE,
barwidth = grid::unit(2.5, "in"),
barheight = grid::unit(0.15, "in")))Faceted Species Distributions
You might already have geospatial data that you want to map in the form of a shapefile (a file with the .shp extension). For example, the IUCN provides spatial data on the known geographic ranges of many plant and animal species. These are huge downloads, but I have prepared a small subset for you called iucn_primates.shp that contains only non-human primates. We can use the read_sf() function from the sf package to read in the data:
primates <- read_sf(here("assignments/week-08/data/iucn_primates.shp"))
primatesSimple feature collection with 417 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -99.2274 ymin: -34.83983 xmax: 141.9292 ymax: 41.54623
Geodetic CRS: WGS 84
# A tibble: 417 × 3
BINOMIAL FAKE_ID geometry
<chr> <dbl> <MULTIPOLYGON [°]>
1 Allenopithecus nigroviridis 1 (((22.49295 2.161146, 22.5617 2.039266, …
2 Allocebus trichotis 2 (((49.9361 -14.38416, 49.9741 -14.46636,…
3 Alouatta arctoidea 3 (((-69.4574 6.108936, -69.476 6.099036, …
4 Alouatta belzebul 4 (((-35.0807 -6.206164, -35.0746 -6.22086…
5 Alouatta caraya 5 (((-42.2595 -10.18856, -42.4535 -11.1191…
6 Alouatta discolor 6 (((-51.7193 -3.243664, -51.7056 -3.30786…
7 Alouatta guariba 7 (((-38.3472 -12.43216, -38.1769 -12.6876…
8 Alouatta juara 8 (((-72.2923 11.45834, -72.1034 11.39704,…
9 Alouatta macconnelli 9 (((-57.1405 5.849836, -57.1373 5.835936,…
10 Alouatta nigerrima 10 (((-55.1808 -2.579764, -55.1808 -2.72346…
# ℹ 407 more rows
This is stored as a tidy data frame, similar to other data that we have worked with, with a column for each attribute. But there is an additional column called “geometry” that contains the relevant spatial information, along with some spatial metadata such as the geographic projection and the bounding box.
Since it functions as a normal data frame, we can perform operations on it, such as filtering for particular species (the column in the data that contains species is called “BINOMIAL”). Let’s get the range for the white-faced capuchin, Cebus capucinus, and also get the bounding box for this species.
# C. capucinus range
c_capucinus <- filter(primates, BINOMIAL == "Cebus capucinus")
# C. capucinus bbox
cc_bbox <- st_bbox(c_capucinus)To avoid having to manually write in the xlim and ylim values, as we did for the Africa example above, we can refer to the cc_bbox object that we created, referencing the specific values that we want to extract.
# Range map for C. capucinus
ggplot() +
geom_sf(data = world, fill = "gray95") +
geom_sf(data = c_capucinus, fill = "darkorchid", alpha = 0.5) +
geom_sf_label(data = world, aes(label = name)) +
coord_sf(xlim = cc_bbox[c("xmin", "xmax")],
ylim = cc_bbox[c("ymin", "ymax")]) +
labs(x = "Longitude", y = "Latitude") +
guides(fill = "none") +
theme(panel.background = element_rect(fill = "aliceblue"),
panel.border = element_rect(fill = NA))Problem 4: Filter and bounding box for gibbon distribution data
And of course, we can do all the other nifty things that we do with normal ggplot2 plots, such as faceting. Let’s filter so that we pull out all gibbon species belonging to the genus Hylobates, and then create a faceted map of their geographic ranges. The str_detect() function is part of the stringr package, which is a part of the tidyverse. It tests for the presence of a sequence of characters, and returns TRUE if the sequence is detected. In this case, the filter statement below tests each value in BINOMIAL, and retain rows where the string “Hylobates” was detected.
# Gibbon ranges
gibbons <- filter(primates, str_detect(BINOMIAL, "Hylobates"))
# Gibbon bbox
gib_bbox <- st_bbox(gibbons)Problem 5: Gibbon species distribution maps
Note how in this plot I have italicized the facet titles using strip.text = element_text(face = "italic") in the theme() function.
Use a different fill for each gibbon species, and use the “Dark2” palette from RColorBrewer.
# Faceted distribution maps
ggplot() +
geom_sf(data = world, fill = "gray95") +
geom_sf(data = gibbons, aes(fill = BINOMIAL),
alpha = 0.75, color = NA) +
coord_sf(xlim = gib_bbox[c("xmin", "xmax")],
ylim = gib_bbox[c("ymin", "ymax")]) +
labs(x = "Longitude", y = "Latitude") +
scale_fill_brewer(palette = "Dark2") +
guides(fill = "none") +
facet_wrap(vars(BINOMIAL), nrow = 2) +
theme(panel.background = element_rect(fill = "aliceblue"),
panel.border = element_rect(fill = NA),
strip.text = element_text(face = "italic"))Mapping small-scale linear and point features
Another spatial task that you might be faced with is plotting location points relative to other spatial features in some small geographic area. For example, you might plot the locations of artifacts collected at an archaeological site in relation to roads, buildings, etc. In this example, we’re going to plot GPS location points of five groups of capuchin monkeys over a 1-year time period along with streams, trails, and roads to gain insights about their movement patterns.
The data include:
- CSV file of ranging data that include X-Y coordinates of each GPS point
- A shapefile of researcher trails in the study area
- A shapefile of streams in the region
- A shapefile of a paved road leading into the study area
Read in the data layers:
# CSV point layer
cap_df <- read_csv(here("assignments/week-08/data/cap-ranging.csv"))
# Shapefiles
trails <- read_sf(here("assignments/week-08/data/trails.shp"))
streams <- read_sf(here("assignments/week-08/data/streams.shp"))
roads <- read_sf(here("assignments/week-08/data/roads.shp"))
knitr::kable(head(cap_df))| focal_group | season | year | date | waypoint | seqnum | activity | altitude | lon | lat |
|---|---|---|---|---|---|---|---|---|---|
| Los Valles | DC | 2011 | 1/6/2011 15:15:58 | R_110106_LV_FC_001 | 1 | FI | 300.3 | -85.61092 | 10.84384 |
| Los Valles | DC | 2011 | 1/6/2011 16:01:39 | R_110106_LV_FC_002 | 1 | SP | 286.0 | -85.61085 | 10.84282 |
| Los Valles | DC | 2011 | 1/6/2011 16:16:29 | R_110106_LV_FC_004 | 2 | FO | 293.5 | -85.61038 | 10.84256 |
| Los Valles | DC | 2011 | 1/6/2011 16:31:26 | R_110106_LV_FC_005 | 3 | FF | 302.4 | -85.61033 | 10.84112 |
| Los Valles | DC | 2011 | 1/6/2011 16:45:30 | R_110106_LV_FC_006 | 4 | TT | 302.3 | -85.61114 | 10.84031 |
| Los Valles | DC | 2011 | 1/6/2011 17:00:08 | R_110106_LV_FC_007 | 5 | TT | 312.7 | -85.61131 | 10.83832 |
Note that the cap_df data are simply a CSV table, whereas the other three data sets are spatial objects. The cap_df data therefore don’t have spatial geometry that can be understood by the geom_sf() function.
To create a basic plot that includes all four data sets, we could just plot the cap_df data as normal X-Y points using geom_point(), but as we’ll see later, there are advantages to making them a spatial object. For now, we’re not going to set limits using coord_sf().
Here’s our basic starting plot:
ggplot() +
geom_point(data = cap_df,
aes(x = lon, y = lat, color = focal_group),
size = 0.5, alpha = 0.5) +
geom_sf(data = roads, linewidth = 1) +
geom_sf(data = streams, color = "skyblue") +
geom_sf(data = trails, color = "gray30", linewidth = 0.1) +
theme(panel.border = element_rect(fill = NA),
panel.background = element_rect(fill = "gray98"))All the points are concentrated in one relatively small region of the study area. We can’t use the bounding box of any of our spatial features (trails, roads, streams), as we have previously done, because they all extend well beyond the points. To zoom in on the points, we could manually set the limits using trial-and-error, or we could convert the points into a spatial feature and then extract the bounding box of the points. Let’s do the latter.
In the capuchin ranging data, our x-y values are longitude and latitude coordinates on the WGS84 global reference system. The standard way of referring to this very common coordinate system is by its EPSG identifier, which is 4326. Explaining all this in detail is beyond the scope of this class, but it was summarized in one of your assigned readings.
To make a spatial feature out of the columns in our data, we use the function st_as_sf() (from the sf package), and we must supply it with the x and y columns, as well as the coordinate reference system. After that, we can get the bounding box.
cap_sf <- st_as_sf(cap_df, coords = c("lon", "lat"), crs = 4326)
cap_bbox <- st_bbox(cap_sf)Problem 6: Zoomed-in plot, also faceted by group
Now we can replace the geom_point() stuff in the code above with geom_sf(...) to get exactly the same plot. Remember that in geom_sf(), we don’t need to provide x and x mappings, but if we want colored points, we still need to provide the color mapping. Also remember that our points data set is now cap_sf rather than cap_df.
Since the coordinates aren’t really useful here, given how small the area is, let’s just turn off the graticule and lat/long values along the axes by providing using datum = NA inside of coord_sf().
Facet the map by group and show each group in a different color using the rcartocolor palette “Bold”.
ggplot() +
geom_sf(data = cap_sf, aes(color = focal_group),
size = 0.5) +
geom_sf(data = roads, linewidth = 1) +
geom_sf(data = streams, color = "skyblue") +
geom_sf(data = trails, color = "gray30", linewidth = 0.25) +
scale_color_carto_d(palette = "Bold") +
coord_sf(xlim = cap_bbox[c("xmin", "xmax")],
ylim = cap_bbox[c("ymin", "ymax")],
datum = NA) +
guides(color = "none") +
labs(x = NULL, y = NULL) +
theme(panel.border = element_rect(fill = NA),
panel.background = element_rect(fill = "gray98")) +
facet_wrap(vars(focal_group))Problem 7: Kernel density estimates
We can also make use of R’s various analysis capabilities to make interesting plots from dense point data, such as 2D kernel density estimates of home ranges for each group. Since these are basic ggplot2 geoms rather than spatial geoms, we need to go back to using the normal (non-spatial) X-Y data set called cap_df.
Fill the density polygons using the rcartocolor palette “SunsetDark”.
ggplot() +
stat_density_2d(data = cap_df,
aes(x = lon, y = lat, fill = stat(level)),
geom = "polygon") +
geom_sf(data = roads, linewidth = 1) +
geom_sf(data = streams, color = "skyblue") +
geom_sf(data = trails, color = "gray30", linewidth = 0.25) +
scale_fill_carto_c(palette = "SunsetDark") +
coord_sf(xlim = cap_bbox[c("xmin", "xmax")],
ylim = cap_bbox[c("ymin", "ymax")],
datum = NA) +
guides(fill = "none") +
labs(x = NULL, y = NULL) +
theme(panel.border = element_rect(fill = NA),
panel.background = element_rect(fill = "gray98")) +
facet_wrap(vars(focal_group))Warning: `stat(level)` was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(level)` instead.
Problem 8: Hex bin densities
Or a hexagonal heatmap of 2D bin counts.
Fill the hexagons using the rcartocolor palette “PurpOr”.
ggplot() +
stat_bin_hex(data = cap_df,
aes(x = lon, y = lat, fill = stat(density)),
color = "black", linewidth = 0.1) +
geom_sf(data = roads, linewidth = 1) +
geom_sf(data = streams, color = "skyblue") +
geom_sf(data = trails, color = "gray30", linewidth = 0.25) +
scale_fill_carto_c(palette = "PurpOr") +
coord_sf(xlim = cap_bbox[c("xmin", "xmax")],
ylim = cap_bbox[c("ymin", "ymax")],
datum = NA) +
guides(fill = "none") +
labs(x = NULL, y = NULL) +
theme(panel.border = element_rect(fill = NA),
panel.background = element_rect(fill = "gray98")) +
facet_wrap(vars(focal_group))Combining vector and raster data
In the last set of problems, we will use the stars package to read in and display a geospatial raster data, and combine this with the other features that we’ve been mapping.
Problem 9: Landcover as background
First, let’s read in a raster file of land cover categories in the same study area as above. We’ll use this as the background (bottom layer) for the map. The raster file is a GeoTiff called LC_20110306.tif.
After reading in the file, we need to make two small modifications (I’ll provide the code). First, the cell value 0 actually represents missing data, so we should replace 0s with NAs. Second, we need to convert the numeric cell values (which are actually category IDs, with values 1, 2, 3, or 4) to a factor to use a discrete fill scale.
The land cover categories are as follows:
- 1 = Open grassland
- 2 = Early-stage successional forest (cleared in last few decades but regrowing)
- 3 = Intermediate-stage successional forest
- 4 = Mature or old-growth forest
lc <- read_stars(here("assignments/week-08/data/LC_20110306.tif"))
lc <- lc %>%
mutate(LC_20110306.tif = na_if(LC_20110306.tif, 0),
LC_20110306.tif = factor(LC_20110306.tif))Now let’s create the plot, putting the landcover map on the bottom. I used the colorbrewer palette called “YlGn”, which seems to fit the land cover categories well.
ggplot() +
geom_stars(data = lc) +
geom_sf(data = cap_sf, color = "darkorchid", size = 0.5, alpha = 0.5) +
geom_sf(data = roads, linewidth = 1) +
geom_sf(data = streams, color = "skyblue") +
geom_sf(data = trails, color = "black", linewidth = 0.25) +
scale_fill_brewer(palette = "YlGn") +
coord_sf(xlim = cap_bbox[c("xmin", "xmax")],
ylim = cap_bbox[c("ymin", "ymax")],
datum = NA) +
guides(fill = "none") +
labs(x = NULL, y = NULL) +
facet_wrap(vars(focal_group)) +
theme(panel.border = element_rect(fill = NA),
panel.background = element_rect(fill = "gray95"))Warning: Removed 935325 rows containing missing values (`geom_raster()`).
Problem 10: Elevation data as the basemap
Elevation is another kind of raster data that is commonly used for mapping. Here, we read in an elevation data of the study area that I downloaded from the freely available Shuttle Radar Topography Mission data products.
elev <- read_stars(here("assignments/week-08/data/acg_elevation.tif"))Because this covers a much larger area of Central America, we need to crop it to the study area. There’s a convenient function for this, st_crop(), that simply takes a spatial layer and a bounding box as its arguments, and it crops the layer to that box. Recall that we already created a suitable bounding box called cap_bbox.
elev_cropped <- elev |>
st_crop(cap_bbox)Finally, let’s create a plot, using the following modifications to improve the display and interpretation of the elevation data:
- Raster layer: use the cropped elevation data.
- Point layer: reduce the size and add more transparency to the points to reduce its visual weight using
size = 0.25andalpha = 0.25. - Density layer: Add white contour lines calculated from the points using
geom_density_2d()(use problem 7 as a guide, but we just want contour lines in this case rather than filled polygons). - Remove the anthropogenic features (trails and roads) from the map.
- Reduce the linewidth of the streams to 0.25.
- Use an appropriate scale function to fill the raster using the palette “oleron” from the
scicopackage, which is specially crafted for elevation and bathymetric (i.e., ocean depth) data. The first half of the palette is a gradient of blues for the ocean, and the second half is a gradient from greens to browns for land. To make the palette start with the land colors, add the optionbegin = 0.5in your scale function so that it only takes colors from the second half of the palette.
ggplot() +
geom_stars(data = elev_cropped) +
geom_sf(data = cap_sf, color = "black", size = 0.25, alpha = 0.25) +
geom_density2d(data = cap_df,
aes(x = lon, y = lat), color = "white") +
geom_sf(data = streams, color = "skyblue", linewidth = 0.25) +
scale_fill_scico(palette = "oleron", begin = 0.5) +
coord_sf(xlim = cap_bbox[c("xmin", "xmax")],
ylim = cap_bbox[c("ymin", "ymax")],
datum = NA) +
guides(fill = "none") +
labs(x = NULL, y = NULL) +
facet_wrap(vars(focal_group)) +
theme(panel.border = element_rect(fill = NA),
panel.background = element_rect(fill = "gray95"))